202 research outputs found

    Characterizing Self-Developing Biological Neural Networks: A First Step Towards their Application To Computing Systems

    Get PDF
    Carbon nanotubes are often seen as the only alternative technology to silicon transistors. While they are the most likely short-term one, other longer-term alternatives should be studied as well. While contemplating biological neurons as an alternative component may seem preposterous at first sight, significant recent progress in CMOS-neuron interface suggests this direction may not be unrealistic; moreover, biological neurons are known to self-assemble into very large networks capable of complex information processing tasks, something that has yet to be achieved with other emerging technologies. The first step to designing computing systems on top of biological neurons is to build an abstract model of self-assembled biological neural networks, much like computer architects manipulate abstract models of transistors and circuits. In this article, we propose a first model of the structure of biological neural networks. We provide empirical evidence that this model matches the biological neural networks found in living organisms, and exhibits the small-world graph structure properties commonly found in many large and self-organized systems, including biological neural networks. More importantly, we extract the simple local rules and characteristics governing the growth of such networks, enabling the development of potentially large but realistic biological neural networks, as would be needed for complex information processing/computing tasks. Based on this model, future work will be targeted to understanding the evolution and learning properties of such networks, and how they can be used to build computing systems

    Archexplorer for automatic design space exploration

    Get PDF
    Growing architectural complexity and stringent time-to-market constraints suggest the need to move architecture design beyond parametric exploration to structural exploration. ArchExplorer is a Web-based permanent and open design-space exploration framework that lets researchers compare their designs against others. The authors demonstrate their approach by exploring the design space of an on-chip memory subsystem and a multicore processor.Postprint (published version

    Mathematical justification of the hydrostatic approximation in the primitive equations of geophysical fluid dynamics

    Get PDF
    Geophysical fluids all exhibit a common feature: their aspect ratio (depth to horizontal width) is very small. This leads to an asymptotic model widely used in meteorology, oceanography, and limnology, namely the hydrostatic approximation of the time-dependent incompressible Navier–Stokes equations. It relies on the hypothesis that pressure increases linearly in the vertical direction. In the following, we prove a convergence and existence theorem for this model by means of anisotropic estimates and a new time-compactness criterium.Fonds Franco-Espagnol D.R.E.I.FMinisterio de Educación y Cienci

    Chaos in computer performance

    Get PDF
    Modern computer microprocessors are composed of hundreds of millions of transistors that interact through intricate protocols. Their performance during program execution may be highly variable and present aperiodic oscillations. In this paper, we apply current nonlinear time series analysis techniques to the performances of modern microprocessors during the execution of prototypical programs. Our results present pieces of evidence strongly supporting that the high variability of the performance dynamics during the execution of several programs display low-dimensional deterministic chaos, with sensitivity to initial conditions comparable to textbook models. Taken together, these results show that the instantaneous performances of modern microprocessors constitute a complex (or at least complicated) system and would benefit from analysis with modern tools of nonlinear and complexity science

    CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs

    Get PDF
    Since processor performance scalability will now mostly be achieved through thread-level parallelism, there is a strong incen- tive to parallelize a broad range of applications, including those with complex control flow and data structures. And writing par- allel programs is a notoriously difficult task. Beyond processor performance, the architect can help by facilitating the task of the programmer, especially by simplifying the model exposed to the programmer. In this article, among the many issues associated with writing par- allel programs, we focus on finding the appropriate parallelism granularity, and efficiently mapping tasks with complex control and data flow to threads. We propose to relieve the user and com- piler of both tasks by delegating the parallelization decision to the architecture at run-time, through a combination of hardware and software support and a tight dialogue between both. For the software support, we leverage an increasingly popular approach in software engineering, called component-based pro- gramming; the component contract assumes tight encapsulation of code and data for easy manipulation. Previous research works have shown that it is possible to augment components with the ability to split/spawn, providing a simple and fitting approach for programming parallel applications with complex control and data structures. However, such environments still require the program- mer to determine the appropriate granularity of parallelism, and spawning incurs significant overheads due to software run-time system management. For that purpose, we provide an environment with the ability to spawn conditionally depending on available hardware resources, and we delegate spawning decisions and actions to the architec- ture. This conditional spawning is implemented through frequent hardware resource probing by the program. This, in turn, enables rapid adaptation to varying workload conditions, data sets and hardware resources. Furthermore, thanks to appropriate com- bined hardware and compiler support, the probing has no signifi- cant overhead on program performance. We demonstrate this approach on an 8-context SMT, sev- eral non-trivial algorithms and re-engineered SPEC CINT2000 benchmarks, written using component syntax processed by our toolchain. We achieve speedups ranging from 1.1 to 3.0 on our test suite

    A Sampling Method Focusing on Practicality

    Get PDF
    In the past few years, several research works have demonstrated that sampling can drastically speed up architecture simulation, and several of these sampling techniques are already largely used. However, for a sampling technique to be both easily and properly used, i.e., plugged and reliably used into many simulators with little or no effort or knowledge from the user, it must fulfill a number of conditions: it should require no hardware-dependent modification of the functional or timing simulator, it should simultaneously consider warm-up and sampling, while still delivering high speed and accuracy.\\ \indent The motivation for this article is that, with the advent of generic and modular simulation frameworks like ASIM, SystemC, LSE, MicroLib or UniSim, there is a need for sampling techniques with the aforementioned properties, i.e., which are almost entirely \emph{transparent} to the user and simulator agnostic. In this article, we propose a sampling technique focused more on transparency than on speed and accuracy, though the technique delivers almost state-of-the-art performance. Our sampling technique is a hardware-independent and integrated approach to warm-up and sampling; it requires no modification of the functional simulator and solely relies on the performance simulator for warm-up. We make the following contributions: (1) a technique for splitting the execution trace into a potentially very large number of variable-size regions to capture program dynamic control flow, (2) a clustering method capable of efficiently coping with such a large number of regions, (3) a \emph{budget}-based method for jointly considering warm-up and sampling costs, presenting them as a single parameter to the user, and for distributing the number of simulated instructions between warm-up and sampling based on the region partitioning and clustering information.\newline \indent Overall, the method achieves an accuracy/time tradeoff that is close to the best reported results using clustering-based sampling (though usually with perfect or hardware-dependent warm-up), with an average CPI error of 1.68\% and an average number of simulated instructions of 288 million instructions over the Spec benchmarks. The technique/tool can be readily applied to a wide range of benchmarks, architectures and simulators, and will be used as a sampling option of the UniSim modular simulation framework

    Quick and practical run-time evaluation of multiple program optimizations

    Get PDF
    This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we propose a low-overhead phase detection scheme geared toward fast optimization space pruning, using code instrumentation and versioning implemented in a production compiler. Our approach is driven by simplicity and practicality. We show that a simple phase detection scheme can be sufficient for optimization space pruning. We also show it is possible to search for complex optimizations at run-time without resorting to sophisticated dynamic compilation frameworks. Beyond iterative optimization, our approach also enables one to quickly design selftuned applications. Considering 5 representative SpecFP2000 benchmarks, our approach speeds up iterative search for the best program optimizations by a factor of 32 to 962. Phase prediction is 99.4% accurate on average, with an overhead of only 2.6%. The resulting self-tuned implementations bring an average speed-up of 1.4
    corecore